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TIME  SERIES  FORECASTING  MODELS 
INVOLVING  POWER  TRANSFORMATIONS 


Abstract 


In  this  paper  we  discuss  procedures  for  overcoming  some  of  the 
problems  Involved  in  fitting  autoregressive  integrated  moving  average 
forecasting  models  to  time  series  data,  when  the  possibility  of  incor- 
porating an  instantaneous  power  transformation  of  the  data  into  the 
analysis  is  contemplated.  The  procedures  are  illustrated  using  series 
of  quarterly  observations  on  corporate  earnings  per  share. 


1.  Power  Transformations  and  ARIMA  Models 

Box  and  Jenkins  (1970)  described  in  detail  a  methodology  for  fitting 
to  an  observed  time  series,  X  ,  or  ARIMA  (p,d,q)  model 

(l-*-B-...-4  b'XI-B)^  -  (1-6. B-. ..-9  Bq)a,         (1.1) 
1       p         t       1       q    t 

where  B  is  a  back-shift  operator  on  the  index  of  the  time  series,  so  that 

BJX_  =  X^  . .   In  (1.1),  a^  is  taken  to  be  a  zero-mean,  fixed  variance, 
t    t-j  '   t  ' 

non-autocorrelated  process,  known  as  "white  noise".  For  seasonal  time 
series,  with  period  s,  a  multiplicative  seasonal  ARIMA  model  of  the  form 

(1-^B-..  .-<frpBP)  (1-^B3-. .  .-*pBPS)  (1-B)  d(l-BS)DXt 

=  (l-e.B-...-e  Bq)(l-01B-...-e_BQs)afr  (1.2) 

1       q       1       Q     t 

is  frequently  fitted.  Box  and  Jenkins  discussed,  in  detail,  an  iterative 
model  building  strategy,  involving  model  selection,  estimation  and  checking, 
for  fitting  to  data  models  of  the  class  (1.1)  or  (1.2).  At  the  selection 
stage,  based  on  statistics  calculated  from  the  data,  a  specific  model 
from  the  general  class  is  chosen  for  subsequent  analysis.  Next,  using 
efficient  statistical  methods,  the  unknown  parameters  of  the  initially 
selected  model  are  estimated.  Finally,  checks  on  the  adequacy  of  represen- 
tation of  the  chosen  model  to  the  data  are  carried  out.  Any  inadequacies 
revealed  at  this  stage  may  suggest  an  alternative  mo del s  and  the  model 
building  cycle  is  iterated  until  a  satisfactory  form  is  achieved.  Box 
and  Jenkins  show  how  forecasts  of  future  values  of  the  time  series  can 
be  obtained  from  a  fitted  model. 

Box  and  Jenkins  discuss  very  briefly,  as  a  possibility  for  obtaining 
a  model  with  homogeneous  error  variance,  fitting  ARIMA  models,  not  necessarily 
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to  the  original  series,  but  to  a  series  derived  from  a  member  of  the 
class  of  power  transformations  analysed  by  Box  and  Cox  (1964).   In  this 
more  general  model,  X  in  (1.1)  or  (1.2)  is  replaced  by  xj  ,  where 

x£X)  =  (X*-l)/X  (X#))  (1.3) 

log  Xt  (X=0) 

and  A  is  regarded  as  an  extra  parameter  to  be  estimated.  Interest  in  this 
model  was  perhaps  first  stimulated  by  the  discussion  following  Chatfield 
and  Prothero  (1973) ,  particularly  the  comments  of  Box  and  Jenkins  (1973) . 
Chatfield  and  Prothero  analysed  a  series  of  monthly  sales  data.  After 
first  taking  logarithms  of  the  observations,  these  authors  built,  following 
the  strategy  of  Box  and  Jenkins,  a  seasonal  ARIMA  model.  However,  the 
forecasting  performance  of  the  achieved  model  was  felt  to  be  unsatisfactory. 
Several  discussants  of  this  paper  suggested  that  this  was  a  result  of 
the  inappropriateness  of  the  logarithmic  transformation,  and  that  superior 
forecasts  could  be  obtained  if  the  more  general  class  of  power  transfor- 
mations were  to  be  incorporated  in  the  model.  Subsequently,  in  a  book 
of  case  studies,  Jenkins  (1979)  has  emphasised  the  potential  utility  of 
the  power  transformation  in  building  time  series  forecasting  models. 

In  the  remainder  of  this  paper  we  will  discuss  procedures  for 
fitting  ARIMA  forecasting  models,  allowing  for  the  possibility  of  Instan- 
taneous power  transformations.  In  particular  we  will  discuss  necessary 
modifications  to  the  usual  selection,  estimation,  checking  and  forecasting 
procedures.  Our  interest  in  this  problem  arose  from  a  study  of  a  large 
collection  of  quarterly  time  series  of  corporate  earnings  per  share,  the 
results  of  which  are  reported  in  Hopwood  et  al  (1981) .  A  good  deal  of 
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recent  interest  in  the  accounting  literature  has  focussed  on  procedures 
for  forecasting  such  series.  The  Financial  Accounting  Standards  Board 
(1978),  in  their  conceptual  framework  project,  has  emphasized  the  importance 
of  earnings  forecasts.  The  construction  of  ARIMA.  forecasting  models  for 
earnings  series  has  been  discussed  by,  for  example,  Foster  (1977) ,  Griffin 

(1977)  and  Lorek  (1979).  Much  of  this  research  has  concentrated  on  two 
questions:  do  corporate  earnings  streams  have  a  common  structure?  (that 
is,  can  one  find  a  single  model  from  the  general  autoregressive  integrated 
moving  average  class  which  predicts  well  for  a  wide  range  of  corporations?) ; 
and,  how  do  the  forecasts  from  time  series  models  compare  with  those  of 
financial  analysts  and  management?  Some  discussion  on  the  latter  point 

is  contained  in  Abdel-khalik  and  Thompson  (1977-78) ,  Brown  and  Rozeff 

(1978)  and  Collins  and  Hopwood  (1980) . 

Although  the  point  had  not  previously  been  noted  in  the  accounting 
literature,  it  became  clear,  in  the  early  stages  of  our  study,  that,  for 
a  great  many  series  in  our  sample,  there  was  strong  evidence  of  the  desir- 
ability of  a  data  transformation  to  induce  homogeneity  of  error  variance. 
It  was  in  response  to  this  phenomenon  that  we  examined  the  problems  to  be 
discussed  in  subsequent  sections  of  this  paper. 

2.  Model  Selection 

Following  Box  and  Jenkins  (1970) ,  specific  models  from  the  general 
classes  (1.1)  or  (1.2)  have  generally  been  chosen  on  the  basis  of  sample 
autocorrelations  and  partial  autocorrelations  of  a  series  and  its  low 
order  differences.  However,  when  we  further  consider  the  possibility 
of  an  instataneous  power  transformation,  the  initial  choice  of  a  model 
is  complicated  by  the  fact  that  the  autocorrelation  structure  of  the 
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* 

transformed  series,  X   ,  and  its  differences,  is  not  independent  of  the 
choice  of  the  transformation  parameter  X  of  (1.3).  Thus,  for  example,  ■ 
if  the  sample  autocorrelations  and  partial  autocorrelations  of  the 
raw  data  and  its  differences  are  employed  in  the  usual  way  to  suggest 
values  for  p,  d  and  q  in  (1.1),  the  chosen  model  may  not  be  adequate  to 
describe  the  linear  properties  of  X^   for  an  "appropriate"  X.  This 
point  is  established  theoretically  by  Granger  and  Newbold  (1976),  while 
a  numerical  example  in  Nelson  and  Granger  (1979)  shows  that  it  can  be 
practically  important. 

Of  course,  the  analyst  is  not  irretrievably  committed  to  the  initially 
chosen  model.  It  is  possible  that  any  seriously  inadequate  specification 
will  be  detected  at  the  model  checking  stage,  and  subsequently  rectified. 
However,  it  is  certainly  sensible  strategy  to  seek  as  reliable  an  initial 
specification  as  possible.  Accordingly,  we  have  found  it  valuable  to 
work  with  an  elaboration  of  the  usual  model  selection  procedure,  based 
on  a  preliminary  estimate  of  the  transformation  parameter  X.  Our  approach 
is  based  on  the  approximation  of  the  underlying  true  model,  by  an  auto- 
regressive  model  of  moderate  order,  since  pure  autoregressive  models  are 
inexpensively  estimated. 

The  preliminary  estimate,  X*,  of  X  is  obtained  by  estimating  by 
least  squares,  for  a  grid  of  values  of  X,  the  kth  order  autoregessions 

X.(X)  =  I  B.X^  +  e  (2.1) 

where,  in  (2.1),  e  is  an  error  term.   Provided  k  is  chosen  sufficiently 

*2 
large,  the  usual  residual  variance,  o  .,  derived  from  fitting  (2.1), 

e,  a 
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provides  a  reasonable  estimate  of  the  variance  of  the  white  noise  error, 
a  ,  of  (1.1)  or  (1.2).   In  practice,  for  the  earnings  series  we  examined, 
it  was  found  that  fixing  k  at  8  was  adequate  for  our  purposes.  A  further 
elaboration  that  might  prove  useful  would  base  the  choice  of  autoregressive 
order  on  some  automatic  criterion,  such  as  AIC  (Akaike  1974)  or  CAT  (Parzen 
1974).  The  initial  estimate  of  X  is  then  that  value  X*  which,  over  the 
grid  of  chosen  values,  maximizes 

g(X)  =  -  f  log  of   +  (X-l)  Z     log  X,  (2.2) 

»A        t*l 

where  n  is  the  length  of  the  series,  and  the  second  term  on  the  right 

hand  side  of  (2.2)  is  the  logarithm  of  the  Jacobian  of  the  transformation 

(1-3). 

Sample  autocorrelations  and  partial  autocorrelations  are  then 

(X*) 
calculated  for  X^    and  its  appropriate  differences,  and  these  are 

employed  in  the  usual  way  to  select  an  appropriate  model.  This  modification 
is  very  easily  incorporated  into  existing  model  selection  routines  and,  since 
(2.1)  is  estimated  by  ordinary  least  squares,  the  additional  computational 
cost  is  very  small. 

To  illustrate  our  approach,  we  analyse  series  of  96  quarterly  earnings 
figures  for  two  corporations,  Weyerhaeuser  Inc.  and  Freeport  Minerals. 
Fitting  autoregressions  (2.1)  with  k  fixed  at  8,  using  the  criterion 
(2.2),  and  searching  over  a  grid  of  width  0.05,  yielded  respective  initial 
transformation  parameter  esitmates  X*  of  -0.19  and  -0.54  for  the  two 
series.  For  the  Weyerhaeuser  data,  the  sample  autocorrelations  of  the 
transformed  series  indicated  that  a  single  non-seasonal  differencing  seemed 
to  be  sufficient  to  induce  stationarity.  The  first  twelve  sample  auto- 
correlations and  partial  autocorrelations  of  the  differenced  series  are 
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shown  in  the  upper  third  of  exhibit  1.  For  comparison,  the  middle  third 
of  this  table  shows  the  same  quantities  for  the  first  differences  of  the 
untransformed  series.  It  is  noticeable  that  the  magnitudes  and  patterns 
of  the  two  sets  of  sample  autocorrelations  are  quite  different,  particularly 
for  low  lags,,  Thus,  it  is  doubtful  that  the  same  model  would  be  identified 
had  the  initial  estimate  of  the  transformation  parameter  not  been  obtained. 
Using  the  figures  in  the  upper  third  of  exhibit  1,  we  tentatively  entertain 
the  model 

(1-OB4) (l-B)x£X)  =  (l-eB)at  (2.3) 

In  fact,  when  this  model  was  estimated,  we  obtained,  as  the  maximum  like- 
lihood  estimate  of  the  transformation  parameter,  X  ■  -0.28.  The  lower 
third  of  exhibit  1  shows  the  sample  autocorrelations  and  partial  auto- 
correlations  of  the  first  differences  of  X^  .  These  are  very  close  to 
those  in  the  upper  third  of  the  table,  suggesting  that  the  initial  estimate 
of  the  transformation  parameter  provides  an  adequate  basis  for  model 
selection. 


Insert  Exhibit  1  about  here 


For  the  Freeport  Minerals  series,  the  sample  autocorrelations  of 
the  initially  transformed  data  indicated  the  desirability  of  both  a  non- 
seasonal  and  seasonal  differencing  factor  to  induce  stationarity.  The 
upper  third  of  exhibit  2  shows  the  first  twelve  sample  autocorrelations 
and  partial  autocorrelations  for  the  appropriately  differenced  series. 
The  middle  third  of  this  table  shows  the  corresponding  quantities  for 
the  untransformed  series.  The  most  important  difference  between  these 
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two  sets  of  statistics  is  at  lag  1,  where,  for  the  un transformed  series, 
the  sample  autocorrelation  is  very  small.  From  the  upper  third  of  the 
table,  we  tentatively  identified  the  model 

(l-B)(l-B4)x£X)  =  (l-eB)(l-9B4)at  (2.4) 

When  the  model  (2.4)  was  estimated  by  maximum  likelihood,  the  estimate  of 
the  transformation  parameter  was  X  =  -0.39.  The  lower  third  of  exhibit  2 
shows  the  sample  autocorrelations  and  partial  autocorrelations  for 
(1-B)(1-B  )x£  .  Once  again  these  are  very  close  to  the  figures  in  the 
upper  third  of  the  table,  suggesting  that  our  procedure  provides  a  sound 
basis  for  model  selection. 


Insert  Exhibit  2  about  here 


Taken  together  with  our  experience  in  analysing  other  data  sets, 
these  examples  suggest  that  our  proposed  model  selection  strategy  can 
be  very  useful  when  transformations  are  employed. 

3.  Parameter  Estimation 

Autoregressive— moving  average  models  are  most  commonly  estimated 
through  one  or  other  of  the  two  least  squares  procedures  described  by 
Box  and  Jenkins  (1970).  Ansley  et  al  (1977)  show  how  to  extend  these 
procedures  to  deal  with  models  involving  power  transformations.  More 
recently,  however,  interest  has  centered  on  exact  maximum  likelihood 
estimation.  A  closed  form  expression  for  the  likelihood  function  was 
given  by  Newbold  (1974) ,  while  Ansley  (1979)  presents  a  computationally 
efficient  algorithm.  For  the  models  (1.1)  and  (1.2)  simulation  evidence 
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in  Ansley  and  Newbold  (1980)  suggests  that  maximum  likelihood  estimation  , 
may  be  preferable  to  least  squares,  particularly  in  seasonal  models. 

Accordingly,  we  employ  exact  maximum  likelihood  to  estimate  our 
models.  A  convenient  algorithm  can  be  derived  by  incorporating  the 
approach  of  Ansley  et  al  (1977)  into  the  framework  of  Ansley  (1979) . 
The  details  are  very  straightforward,  but  algebraically  tedious,  and  so 
will  not  be  set  out  here.  The  likelihood  function  can  be  maximized 
numerically.  Estimated  standard  errors  for  the  parameter  estimates  are 
obtained,  in  the  usual  way,  from  the  estimated  information  matrix. 

For  the  model  (2.3) ,  fitted  to  the  Weyerhaeuser  data,  the  parameter 
estimates  (with  estimated  standard  errors  in  brackets)  were 

«  =  0.38  (0.10);  0  -  0.29  (0.11);  X  -  -0.28  (0.18) 

For  the  model  (2.4)  for  earnings  of  Freeport  Minerals,  our  estimates  were 

8  -  0.23  (0.11);  0  -  0.88  (0.09);  X  -  -0.39  (0.21) 

4.  Model  Checking 

Checks  on  the  adequacy  of  representation  of  fitted  models  of  the 
form  (1.1)  or  (1.2)  have  generally,  following  Box  and  Jenkins  (1970), 
proceeded  along  one  of  two  lines.  More  elaborate  models  can  be  considered 
by  testing  against  an  alternative  involving  additional  parameters.  Also, 
the  assumption  that  the  error  terms,  a  ,  are  white  noise  can  be  checked 
through  examination  of  the  residual  autocorrelations  from  the  fitted  model. 
In  fact,  as  noted  for  example  by  Newbold  (1980),  these  two  approaches  to 
model  checking  are  not  necessarily  distinct.  The  same  tests  may  result 
whichever  perspective  is  adopted.  Recent  developments  in  time  series 
model  checking  are  surveyed  in  Newbold  (1982). 


-9- 

When  power  transformations  are  incorporated  in  the  model,  no  new 
principles  are  involved  in  developing  appropriate  checks  on  model 
adequacy.  Once  again,  we  can  fit  a  more  elaborate  model  or  examine  the 
residual  autocorrelations  from  the  estimated  models.  Exhibits  3  and  4 
show  the  residual  autocorrelations  from  the  ARIMA  models  estimated  for 
earnings  per  share  of  Weyerhaeuser  Inc.  and  Freeport  Minerals.  Given 
that  the  data  series  each  contained  96  observations,  these  autocorrelations 
do  not  seem  unduly  large,  and  provide  little  evidence  on  which  to  question 
the  adequacy  of  the  originally  chosen  models. 


Insert  Exhibits  3  and  4  about  here 


The  modified  portmanteau  statistics  (Ljung  and  Box  1978) 

12 
Q  -  n(n+2)  E  (n-k)"1  r? 
k=l         K 

are,  respectively,  12.73  and  9.91.  Neither  is  significant  at  the  10 
percent  level.  Given  this,  and  the  individually  low  residual  auto- 
correlation values,  we  conclude  that  our  estimated  models  should  provide 
an  adequate  base  for  forecasting  future  earnings  of  these  corporations. 

5.  Forecasting 

Having  fitted  an  ARIMA  model  to  the  time  series"  x£  ,  one  can,  standing 
at  time  n,  compute  h-steps  ahead  forecasts  of  x'v/  in  the  usual  way. 
Forecasts  of  the  untransformed  quantity  X  ^,    could  then  be  obtained  by 
applying  the  inverse  transformation.  However,  as  pointed  out  by  Granger 
and  Newbold  (1976) ,  these  will  not  in  general  be  minimum  mean  squared  error 
predictions.  Nelson  and  Granger  (1979)  show  how  minimum  mean  squared  error 
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forecasts  can  be  achieved,  on  the  assumption  that  the  power  transformation 
yields  a  model  with  normally  distributed  white  noise  errors. 

6.  Empirical  Studies 

We  know  of  just  two  studies  in  which  the  value  of  including  a  power 
transformation  in  time  series  models  has  been  checked,  in  terms  of  the 
resulting  forecast  performance,  over  a  number  of  real  data  sets.. 

Nelson  and  Granger  (1979)  considered  21  published  economic  time 
series.  Forecasting  models  were  built,  with  and  without  the  use  of 
power  transformations,  and  predictions  were  evaluated  over  a  hold-out 
period.  The  results  obtained  were  rather  mixed  and  the  authors  concluded 
that  "the  evidence  when  using  actual  data  is  that  the  extra  inconvenience, 
effort  and  cost  is  such  as  to  make  the  use  of  these  transformations  not 
worthwhile." 

Hopwood  et  al  (1981)  examined  50  quarterly  time  series  of  corporate 
earnings  per  share.  Thus,  while  these  authors  considered  more  series 
than  Nelson  and  Granger,  the  scope  of  coverage  was  far  narrower.   In  this 
particular  study,  however,  judged  by  the  criterion  of  forecasting  accuracy, 
it  was  found  that  incorporating  power  transformations  into  the  model 
proved,  on  the  average,  to  be  worthwhile.  A  noticeable  improvement  in 
forecast  quality  tended  to  follow  when  a  transformation  parameter  was 
included  in  the  ARIMA.  model.  Hopwood  et  al  also  concluded  that  the 
indiscriminate  use  of  the  logarithmic  transformation  in  their  seasonal 
models  was  a  poor  strategy.  This  finding  tends  to  reinforce  the  point 
made  in  the  discussion  of  Chatfield  and  Prothero  (1973). 
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EXHIBIT  3.   RESIDUAL  AUTOCORRELATIONS  (r  )  FROM  MODEL  FITTED  TO 

WEYERHAEUSER  DATA 
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EXHIBIT  4.      RESIDUAL  AUTOCORRELATIONS   FROM  MODEL  FITTED  TO 
FREEPORT  MINERALS  DATA 
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